# REPORT DOCUMENTATION PAGE

Form Approved OMB NO. 0704-0188

Public Reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comment regarding this burden estimates or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188.) Washington, DC 20503.

| 1. AGENCY USE ONLY ( Leave Blank)              | 2. REPORT DATE Deci                | ember 31, 3. REPORT TYPE AND DATES COVERED Final Report, Jan. 2001 - Dec. 2002 | )        |
|------------------------------------------------|------------------------------------|--------------------------------------------------------------------------------|----------|
|                                                |                                    | 20/000/00 - 31                                                                 |          |
| 4. TITLE AND SUBTITLE                          |                                    | 5. FUNDING NUMBERS                                                             |          |
| Wavefront Coded Microscope and Real-           | Time Processor                     | ■ DAAD19-01-C-0004                                                             |          |
| Final Report                                   |                                    |                                                                                |          |
|                                                |                                    |                                                                                |          |
| 6. AUTHOR(S)                                   |                                    |                                                                                |          |
| Gregory E. Johnson                             |                                    | ·                                                                              |          |
|                                                |                                    |                                                                                |          |
| 7. PERFORMING ORGANIZATION NAME(S              |                                    | 8. PERFORMING ORGANIZATION                                                     |          |
| CDM Optics, Inc. 4001 Discovery Dr. Suite 2110 |                                    | REPORT NUMBER ARO.12.2002.F                                                    |          |
| Boulder, Colorado 80303                        |                                    |                                                                                |          |
| 9. SPONSORING / MONITORING AGENCY              | NAME(S) AND ADDRESS(ES)            | 10. SPONSORING / MONITORING                                                    |          |
| 9. SPONSORING / MONTTORING AGENCY              | THE CONTROL TO DECEMENT            | AGENCY REPORT NUMBER                                                           |          |
| U. S. Army Research Office                     |                                    |                                                                                |          |
| P.O. Box 12211                                 |                                    |                                                                                |          |
|                                                | 200 2211                           |                                                                                |          |
| Research Triangle Park, NC 277                 | 09-2211                            | 41676-PH-STZ                                                                   |          |
| 11. SUPPLEMENTARY NOTES                        |                                    |                                                                                |          |
| The views, opinions and/or finding             | s contained in this report are th  | ose of the author(s) and should not be construed as an                         | official |
| Department of the Army position, police        | y or decision, unless so designate | ated by other documentation.                                                   |          |
|                                                |                                    | · · · · · · · · · · · · · · · · · · ·                                          |          |
| 12 a. DISTRIBUTION / AVAILABILITY STA          | TEMENT                             | 12 b. DISTRIBUTION CODE                                                        |          |
|                                                |                                    |                                                                                |          |
| Approved for public release; distrib           | oution unlimited.                  |                                                                                |          |

13. ABSTRACT (Maximum 200 words)

A Wavefront Coded microscope and real-time processor were developed and tested. Algorithms developed during this work lead to compact implementations yielding over 30 billion operations per second for reconstruction of a Wavefront Coded image. Example results were obtained in the laboratory and provided in this report.

20030403 009

|     | SUBJECT TERMS<br>Wavefront Coding                    |                                                             |                                                      | 15. NUMBER OF PAGES  16. PRICE CODE |
|-----|------------------------------------------------------|-------------------------------------------------------------|------------------------------------------------------|-------------------------------------|
|     |                                                      |                                                             |                                                      | io. The cope                        |
| 17. | SECURITY CLASSIFICATION<br>OR REPORT<br>UNCLASSIFIED | 18. SECURITY CLASSIFICATION<br>ON THIS PAGE<br>UNCLASSIFIED | 19. SECURITY CLASSIFICATION OF ABSTRACT UNCLASSIFIED | 20. LIMITATION OF ABSTRACT  UL      |

# ARO STTR Phase 2 - Final Report Wavefront Coded Microscope and Real-Time Processor

CDM Optics, Inc. - December 31, 2002

## **Table of Contents**

| Table  | of Contents                                                          | 1  |
|--------|----------------------------------------------------------------------|----|
| Execu  | utive Summary                                                        | 1  |
| Stater | ment of the Problem                                                  | 1  |
|        | nary of Results                                                      |    |
| 1.     | Opto-Mechanical                                                      | 2  |
| 2.     | Signal Processing                                                    | 3  |
| 3.     | Example Results                                                      |    |
| Public | cations                                                              | 10 |
| 1.     | Technical Reports to ARO                                             |    |
| 2.     | Technical Paper Submissions                                          | 11 |
| Scient | tific Personnel                                                      | 11 |
| Repor  | rt of Inventions                                                     | 11 |
| 1.     | New Wavefront Coding Imaging Systems with Optimized Image Processing |    |
|        |                                                                      |    |

# **Executive Summary**

Wavefront Coding (WFC) was used during Phase 1 to demonstrate a revolutionary system for digital microscopy. WFC employs specialized aspheric optics and digital processing to greatly increase the depth of field of an imaging system. The benefits of WFC come with the burden of post processing since the raw captured images are always blurred by the WFC optics. The goal in Phase 2 was to reduce the processing burden through better optical designs, and to develop a real-time hardware processor based on field-programmable gate arrays (FPGAs).

There were three primary objectives in the Phase 2 effort: i) to develop a WFC microscope, ii) to design real-time WFC processors for high-magnification live-cell microscopy and high-speed manufacturing/inspection systems, and iii) to produce optical designs representative of next-generation WFC microscope objectives. The first two objectives were met and the results surpassed the expectations in the proposal. Example images using the FPGA-equipped Wavefront Coded microscope were obtained in the laboratory and are included in this report. The third objective was not accomplished due to unforeseen issues in integrating the FPGA development system, which fully consumed the available time and resources.

## Statement of the Problem

Biological microscopy, metallurgical, and machine-vision systems rely on high magnification imaging systems for tasks ranging from medical slide scanning to real-time manufacturing control. In many of these processes the object under investigation is usually moving or cannot be placed with precise accuracy in the field of regard. Such cases demand a high quality imaging response from the optics over a broad region in space. The ability of high-resolution objectives to also have a large depth of field is limited by traditional lens design techniques and available materials. Wavefront Coding brings a new paradigm to the microscope by enabling large depth of fields to be obtained without reducing aperture sizes or requiring expensive optical materials. By combining aspheric optics with hardware-based digital signal processing, high magnification and high resolution images can be obtained in real-time for a large depth of field at a reasonable cost and device size.

Real-time Wavefront Coded image production for large-format microscope applications requires over ten billion multiply-adds per second. The minimum data rate for modest sized microscope images is approximately 1k x 1k pixels per frame at 10 frames per second, or 10Meg pixels per second. Consider a PSF for a WFC microscope system that is 32 pixels long in either spatial dimension. Such a system will require a large filter kernel that would contain 32 x 32 elements, or 1024 = 1k coefficients. This system requires 1k computations per pixel, with 10M pixels per second, or 10 billion multiply-adds (MACs) per second. Future biological applications such as live-cell fluorescence microscopy and high-volume slide scanning operations are expected to require kernels exceeding 100x100 coefficients in size.

Some Wavefront Coded optical systems can be processed in a separable fashion, where the columns of the image are filtered independently of the rows. Such filtering is more efficient mathematically than two-dimensional, or non-separable, spatial filtering. A rectangularly separable design similar to that described above would contain 32+32=64 coefficients and require 640 million MACs per second. Both separable and non-separable processing has use in WFC reconstruction and both systems are useful for microscopy. The processing must also be scalable in image size, kernel size, and frame rate, as future WFC microscope systems for biological and medical use can approach 4k x 4k sensors operating at 25 to hundreds of frames per second.

## **Summary of Results**

Real-time stand-alone Wavefront Coded processors for two-dimensional filtering have been generated using current state-of-the-art silicon-based processing hardware on field programmable gate arrays (FPGAs). The FPGA provided a hardware solution for both non-separable and separable processing. Processing speeds of up to 30 billion operations per second have been demonstrated on a one milliongate FPGA device. This represents the ability to process VGA-sized Wavefront Coded images at over 100 frames per second.

The objectives in the proposal were split into two groups, opto-mechanical and signal-processing. The opto-mechanical tasks were intended to develop new optical components that would provide better responses than the current designs. New optics also could reduce the processing burden by being optimized for post-processing with an FPGA device. Signal processing tasks included generating both separable and non-separable designs. Integration issues with the FPGA devices consumed far more resources than budgeted and the optical portion of the work was not entirely completed.

#### 1. Opto-Mechanical

The opto-mechanical portion of this project was intended to identify, design, and manufacture optimal Wavefront Coded optics for retro-fitting a microscope. Interaction with biological researchers and machine vision manufacturers during this project has revealed that there is no single "optimal" solution, and that every customer will often bring their own unique challenges. While several configurations and designs could have been generated to satisfy many of the potential customers, the integration of the processing device consumed available resources and this task was not undertaken.

Without designs, manufacturing and metrology efforts were not pursued. To demonstrate the effectiveness of the processor, existing separably-designed WFC elements were used in a laboratory microscope configuration. These elements are admittedly sub-optimal since they rely on a separable optical design and are optimized for separable processing, whereas the FPGA can perform non-separable processing.

Current problems with the existing WFC design for biological microscopes, clinical scanning, and machine vision microscopes were identified (objective 5) to include a lateral translation in the image that occurs with defocus. This effect is due to the separable nature of the existing cubic designs, and is not a new phenomenon to WFC systems. The impact to biological and machine vision applications has prompted the specification and development of new optical design tools within CDM Optics to address this problem and others like it more efficiently in the future.

#### 2. Signal Processing

The signal processing tasks consumed most of the resources in the program. Algorithm development, core design, and integration of the development platforms took the majority of the effort. Considerable effort was required from both CDM and the University of Colorado for the integration of the Sundance FPGA development system.

The primary objectives for the signal processing portion of the project were met successfully. A provisional patent disclosure was also generated regarding the algorithms developed. Hardware was procured and integrated, and algorithm designs were developed, integrated, and tested. While the Sundance hardware proved extremely difficult to integrate and ultimately limited in utility, development boards from Insight were successfully integrated into an experimental system. The communication to these devices proved far more reliable than those in the Sundance system and excellent results were obtained.

A general block diagram of a Wavefront Coded imaging system is shown in Figure 2.1. The figure shows the basic flow of information for both a software-based system as was used in Phase 1 and a hardware-based system developed during this Phase 2. Note that the processor block and frame grabber block exchange places, and software processing is replaced with an FPGA hardware device.

WFC in a Software-Based System (Phase 1)



Figure 2.1. Block diagram of two basic kinds of WFC implementations. Note that in the hardware system pursued in this Phase 2 effort the processing occurs before the frame grabber.

Figure 2.2 shows the FPGA platform used to obtain experimental results. While this device was considerably smaller than the Xilinx Virtex-1600E part on the Sundance system (2.1-million gates plus extra on-chip RAM) we were able to run reasonably sized designs in the laboratory. The 1600E part on the Sundance system was found to be able to handle the 32x32 sized filter kernels through simulation and design, and such a device is expected to handle up to 40x40. The inability to fully integrate the Sundance 1600E into a working platform precluded experiments with larger kernel sizes.



Figure 2.2. FPGA development board from Insight with a 1-million gate device (Xilinx VirtexII-1000) that was used in laboratory experiments.

<u>Algorithms</u>: The algorithms developed were shown to be scalable to process non-separable kernels of any size and images of any size. In practice the size of filtering kernels allowed is limited to the resources in the selected FPGA, and also the design tool capability for synthesizing a workable core within a reasonable amount of time. Speed limitations on the FPGA devices (typically only hundreds of MHz) limit both the ultimate size of the designs and throughput of the device.

Cores derived in Phase 2 were loaded onto development systems from Insight and used in CDM's laboratory for testing. The results from the FPGA were found to be identical to those in non-separable software processing, provided all round-off errors can be kept equal. The cores operate at the same frame rate as the camera and provide an output data stream identical in format to the input data stream, enabling smooth integration into existing imaging systems.

The algorithms developed were ultimately migrated to fully-customized design architecture. Cores from Xilinx for multiplication and distributed-arithmetic finite-impulse-response filtering were replaced with custom multiply-add sequences scripted entirely in the hardware description language VHDL. Some Xilinx cores remain in the design for simple structures such as buffers and clocking or control logic.

The basic design concept for a Wavefront Coded reconstruction core consists of a series of buffers and single-element taps. The buffers consist of block RAM elements which delay the pixels appropriately for the image size and kernel size. The tap structures perform scaling on their respective input pixel values and an adder and scaling provides the final tap output. Figure 2.3 shows a flow diagram of the basic algorithm.



Figure 2.3. Block diagram of the basic convolution algorithm. Buffers are used for delay-line structures. The taps perform scaling on the input pixel values and an adder provides the final tap output.

The design is scalable for a variety of reconstruction kernels and image sizes. Larger image sizes increase the requirements for delay lines and hence can increase the core size significantly. Kernel sizes can also scale the design size, so compact kernels are desired in hardware processing as well as software processing. For very large image sizes, off-chip RAM could be utilized rather than internal RAM blocks for performing the delay line.

The actual coefficient values within the kernels was found to significantly impact the final core size, and competencies for using "special" or efficient coefficient values for Wavefront Coded reconstruction have been developed. Kernels with values that are strictly powers-of-2, or special sums-of-powers-of-2 are examples of "special" kernels.

Using specialized bit-level logic for performing tap scaling allows a cost function to be attached to any coefficient. A simple cost function for a coefficient with value zero might be 0, i.e. no cost, since a zero does not contribute to the computational burden (although it still may demand storage space). For nonzero coefficients, cost functions can be assigned based on simple rules, for example assume negative coefficients are more costly than positive coefficients, and assume subtraction is more costly than addition.

Based on these rules and a variety of other manipulations that occur at the bit-level, one can generate an example cost table for any set of discrete-valued coefficients. Figure 2.4 below shows the relative costs of integer coefficients 0-64 for an example Wavefront Coded system. From the table we can see that the most costly coefficient is '59', so building a filter kernel with a coefficient value of '59' will cost considerably more than using, for example, the "reasonably similar" coefficient of '60' which has less than half the cost of a coefficient '59', and choosing '64' would be even cheaper.



Figure 2.4. Plot of cost versus positive integer coefficients 0-64.

<u>WFCProcGen</u>: An automatic convolution-core generation tool <u>WFCProcGen</u> was developed in C++ which allows rapid generation of the VHDL processing core bitstream from a few simple parameters. The development of <u>WFCProcGen</u> was initiated after realizing that 'hand-coding' a processor core design of any significant size was time consuming. The tool greatly automates the process and reduces design generation time by orders of magnitude.

The parameters available to the designer include image size, image bit-depth, kernel size, coefficient bit-depth, and the integer kernel coefficients themselves. *WFCProcGen* automatically generates necessary Xilinx cores by generating and executing batch file processes that invoke the Xilinx Core Generator, and generates the VHDL code as concurrent processes and a top-level architecture. The VHDL code that is generated is a syntactically correct VHDL "program" or top-level design and architecture, including all necessary signaling and multiply-accumulate logic required for implementing the non-separable convolution algorithms. This top-level design can be integrated with existing VHDL or Verilog applications or can be synthesized and implemented as a stand-alone processor.

WFCProcGen typically continues with the design, including launching the Xilinx tools for synthesis, mapping, and place-and-route. The tool ultimately produces a downloadable bitstream ready for implementation on the FPGA device, based on the user parameters and the kernel that is to be implemented.

#### 3. Example Results

<u>PSFs</u>: Generating a WFC microscope with a hardware processor involves several steps. The Wavefront Coded system is first characterized optically by measuring the point-spread function or PSF, shown in Figure 3.1. This represents the impulse response of the optical system in two dimensions. A reconstruction filter is then derived from the measured PSF data with the (typical) goal of producing a diffraction-limited PSF response in the final image. A least-squares solution is used to find the filter kernel from the sampled data and the desired response.



Figure 3.1. Sampled PSFs for the microscope system.

<u>Kernels</u>: Two reconstruction filters for this optical configuration were derived and are shown in Figures 3.2 and 3.3. Note that two distinct lengths of filter were designed, 17x17 and 19x19 to compare design sizes and filter performance on typical images.



Figure 3.2. 19x19 reconstruction kernel and its frequency response, shown in dB.

The size of the PSF that required reconstruction demanded that the kernel sizes by 17x17 or larger. In general, the larger the depth of field extension, the larger the PSF will be spatially, and the larger the reconstruction kernel must also be. Sizes also depend on magnification and the speed (or numerical aperture) of the objective. The VirtexII-1000 FPGA on the insight board could only handle a kernel of 19x19 for this particular WFC element and the optical configuration selected. While the range of sizes is not extreme, it will serve to demonstrate the variability that can be achieved in reconstructions. This variability allows more options to system designers and integrators to "choose" their optimal response based on the task they face.



Figure 3.3. 17x17 reconstruction kernel and its frequency response, shown in dB.

Xilinx Design Reports: A design report from the Xilinx synthesis and simulation tools provides an indication of the design's viability, and its ability to operate with a specific device, and highlight potential timing problems. An example synthesis report for a 19x19 sized kernel used to generate the example images is provided in Figure 3.4. Note that on the particular device chosen (the Xilinx VirtexII-1000 on the Insight development board) the kernel consumes nearly 100% of the resources but attains a design speed of 50MHz.

```
Design Summary: 19x19 Kernel
   Number of Slices:
                                       5,118 out of
                                                       5,120
                                                                998
   Number of Slices containing
      unrelated logic:
                                      1,723 out of
                                                     5,118
                                                                338
  Number of Slice Flip Flops:
Total Number 4 input LUTs:
Number of bonded IOBs:
                                      5,318 out of 10,240
                                                                51%
                                      8,760 out of 10,240
                                                                85%
                                         62 out of
                                                         324
                                                                19%
   Number of Block RAMs:
                                         36 out of
                                                          40
                                                                90%
   Number of GCLKs:
                                           2 out of
                                                          16
                                                                12%
Timing summary:
Design statistics:
   Minimum period: 19.880ns (Maximum frequency: 50.302MHz)
```

Figure 3.4. Report from the synthesis tool for a 19x19 kernel on a Xilinx VirtexII-1000 FPGA. Note that nearly 100% of the device is used, and that it can operate with a pixel clock of up to 50MHz.

| Design Summary: 17x17 Kernel                                               |       |        |        | o a ti |  |
|----------------------------------------------------------------------------|-------|--------|--------|--------|--|
| Number of Slices:<br>Number of Slices containing                           | 5,118 | out of | 5,120  | 99%    |  |
| unrelated logic:                                                           | 647   | out of | 5,118  | 12%    |  |
| Number of Slice Flip Flops:                                                | 4,362 | out of | 10,240 | 42%    |  |
| Total Number 4 input LUTs:                                                 | 7,589 | out of | 10,240 | 74%    |  |
| Number of bonded IOBs:                                                     | 54    | out of | 324    | 16%    |  |
| Number of Block RAMs:                                                      | 32    | out of | 40     | 80%    |  |
| Number of GCLKs:                                                           | 1     | out of | 16     | 6%     |  |
| Timing summary:                                                            |       |        |        |        |  |
|                                                                            |       |        |        |        |  |
| Design statistics: Minimum period: 17.876ns (Maximum frequency: 55.941MHz) |       |        |        |        |  |

Figure 3.5. Report from the synthesis tool for a 17x17 kernel on a Xilinx VirtexII-1000 FPGA. Note that the resource utilization is nearly identical to the 19x19 case, and the 17x17 part can run 10% faster or operate with a 55MHz pixel clock.

<u>Results</u>: The cores were loaded onto the Insight development platform and integrated into a Uniq-UP1830 camera which generates 1024x1024 sized images at 30 frames/second which corresponds to a 45MHz pixel clock. Images of a roughly spherical diatom were obtained from a commercial microscope and processed in real-time using the two cores. This particular configuration provides a depth of field increase of around 4x the original DOF.

Figures 3.6 through 3.8 provide an example of the capability of Wavefront Coding for depth of field extension and highlight the reconstruction quality for each filter.



Figure 3.6. Traditional microscope images. Only an annular region with a thickness of about 1/4 the radius of the object is in focus. The in-focus annulus is the ring of highest contrast, roughly centered between the middle of the object and its edge.



Figure 3.7. Wavefront Coded microscope image, reconstructed at 30 frames/second with a 19x19 sized kernel that had a noise gain of 2.2. Note the clear depth of field extension in the Wavefront Coded image. The depth extension is approximately 4x.



Figure 3.8. Wavefront Coded microscope image, reconstructed at 30 frames/second with a 17x17 sized kernel that had a noise gain of 1.2. The depth extension is approximately 4x. The lower contrast is due to the shorter filter and smaller noise gain.

The results in Figures 3.7 and 3.8 highlight both features and drawbacks to the Wavefront Coded microscope. The obvious advantage is the 4x depth of field extension as compared to the traditional image in Figure 3.6. The slight shading on the right side of the object and downward is expected to be reduced or eliminated with new designs of Wavefront Coding elements.

The FPGA core is designed to process non-separable signals, and a matched non-separable optical design generates more symmetric anomalies than separable designs. Future Wavefront Coded designs are expected to be symmetric and offer less sensitivities to other optical variations, both of which would greatly improve the results available through non-separably reconstructed images.

#### **Publications**

#### 1. Technical Reports to ARO

Johnson, G., Macon, A., Cathey, T., "Wavefront Coded Microscope and Real-Time Processor – Interim Report", *Interim Report to ARO*, March, 2001.

Johnson, G., Macon, A., Cathey, T., "Wavefront Coded Microscope and Real-Time Processor – Interim Report", *Interim Report to ARO*, June, 2001.

Johnson, G., Macon, A., Cathey, T., "Wavefront Coded Microscope and Real-Time Processor – Interim Report", *Interim Report to ARO*, September, 2001.

Johnson, G., Macon, A., Cathey, T., "Wavefront Coded Microscope and Real-Time Processor – Interim Report", *Interim Report to ARO*, December, 2001.

Johnson, G., Macon, A., Cathey, T., "Wavefront Coded Microscope and Real-Time Processor – Interim Report", *Interim Report to ARO*, March, 2002.

Johnson, G., Macon, A., Cathey, T., "Wavefront Coded Microscope and Real-Time Processor – Interim Report", *Interim Report to ARO*, June, 2002.

Johnson, G., Macon, A., Cathey, T., "Wavefront Coded Microscope and Real-Time Processor – Interim Report", *Interim Report to ARO*, September, 2002.

Johnson, G., Macon, A., Cathey, T., "Wavefront Coded Microscope and Real-Time Processor – Interim Report", *Interim Report to ARO*, December, 2002.

#### 2. Technical Paper Submissions

Johnson, G., Macon, A., Chumachenko, V., "Real-Time Wavefront Coded Microscopy", *Submitted for publication*, Digital Photography Conference of the Imaging Science and Technology, Submitted on November, 2002.

## Scientific Personnel

Dr. Gregory E. Johnson (CDM Optics, Inc.)

Dr. Edward R. Dowski (CDM Optics, Inc.)

Ash K. Macon (CDM Optics, Inc.)

Dr. Tom Cathey (University of Colorado, Imaging Systems Laboratory)

Roman Novoselov (University of Colorado, Imaging Systems Laboratory)

Sreeram Vaidyanathan (University of Colorado, Imaging Systems Laboratory)

## Report of Inventions

### 1. New Wavefront Coding Imaging Systems with Optimized Image Processing

In the first-quarter report in 2002, CDM reported that it has filed a provisional patent entitled "New Wavefront Coding Imaging Systems with Optimized Image Processing". This provisional patent describes several new forms of processing and optimized reconstruction kernels.